{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Saving `Datasets` and `DataArrays` to NetCDF\n",
    "\n",
    "## Objectives\n",
    "\n",
    "Introduce an easy method for saving `Datasets` and `DataArrays` objects to NetCDF\n",
    "\n",
    "## Introduction\n",
    "\n",
    "Saving your `Datasets` and `DataArrays` objects to NetCDF files couldn't be simpler.  The `xarray` module that we've been using to load NetCDF files provides methods for saving your `Datasets` and `DataArrays` as NetCDF files.\n",
    "\n",
    "Here is the manual page on the subjet: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html\n",
    "\n",
    "The method `._to_netcdf( )` is available to **both** `Datasets` and `DataArrays` objects.  So useful!\n",
    "\n",
    "### Syntax\n",
    "``\n",
    "your_dataset.to_netcdf('/your_filepath/your_netcdf_filename.nc')\n",
    "``\n",
    "\n",
    "## Saving an existing `Dataset` to NetCDF\n",
    "\n",
    "First, let's set up the environment and load a `Dataset`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import xarray as xr\n",
    "import sys\n",
    "import matplotlib.pyplot as plt\n",
    "import json\n",
    "import sys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Import the ecco_v4_py library into Python\n",
    "## =========================================\n",
    "\n",
    "#import ecco_v4_py as ecco\n",
    "\n",
    "## -- If ecco_v4_py is not installed in your local Python library, \n",
    "##    tell Python where to find it.  For example, if your ecco_v4_py\n",
    "##    files are in /Users/ifenty/ECCOv4-py/ecco_v4_py, then use:\n",
    "sys.path.append('/home/ifenty/ECCOv4-py')\n",
    "import ecco_v4_py as ecco"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "load a single tile, monthly averaged *THETA* for March 2010 for model tile 2. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Set top-level file directory for the ECCO NetCDF files\n",
    "## =================================================================\n",
    "# base_dir = '/home/username/'\n",
    "base_dir = '/home/ifenty/ECCOv4-release'\n",
    "\n",
    "## define a high-level directory for ECCO fields\n",
    "ECCO_dir = base_dir + '/Release3_alt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "## LOAD NETCDF FILE\n",
    "## ================\n",
    "\n",
    "# directory of the file\n",
    "data_dir= ECCO_dir + '/nctiles_monthly/THETA/'\n",
    "\n",
    "# filename\n",
    "fname = 'THETA_2010.nc'\n",
    "\n",
    "# load the dataset file using xarray\n",
    "theta_dataset = xr.open_dataset(data_dir + fname).load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we've loaded *theta_dataset*, let's save it in the **/tmp** file directory with a new name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "saving to  ./test_output.nc\n",
      "finished saving\n"
     ]
    }
   ],
   "source": [
    "new_filename_1 = './test_output.nc'\n",
    "print ('saving to ', new_filename_1)\n",
    "theta_dataset.to_netcdf(path=new_filename_1)\n",
    "print ('finished saving')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*It's really that simple!*\n",
    "\n",
    "## Saving a new custom ``Dataset`` to NetCDF\n",
    "\n",
    "\n",
    "Now let's create a new custom `Dataset` that with *THETA*, *SSH* and model grid parameter variables for a few tiles and depth level 10."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "loading files of  THETA\n",
      "loading files of  SSH\n"
     ]
    }
   ],
   "source": [
    "data_dir= ECCO_dir + '/nctiles_monthly/'\n",
    "\n",
    "SSH_THETA_201003 = \\\n",
    "    ecco.recursive_load_ecco_var_from_years_nc(data_dir, \\\n",
    "                                              ['SSH', 'THETA'], \\\n",
    "                                              tiles_to_load = [0,1,2],\n",
    "                                              years_to_load = 2010)\n",
    "grid_dir = ECCO_dir + '/nctiles_grid/'    \n",
    "grid = ecco.load_ecco_grid_nc(grid_dir, 'ECCOv4r3_grid.nc')\n",
    "grid.close()\n",
    "\n",
    "custom_dataset = xr.merge([SSH_THETA_201003, grid])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and now we can easily save it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "saving to  ./test_output_2.nc\n",
      "finished saving\n"
     ]
    }
   ],
   "source": [
    "new_filename_2 = './test_output_2.nc'\n",
    "print ('saving to ', new_filename_2)\n",
    "custom_dataset.to_netcdf(path=new_filename_2)\n",
    "custom_dataset.close()\n",
    "print ('finished saving')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:    (i: 90, i_g: 90, j: 90, j_g: 90, k: 50, k_l: 50, k_p1: 51, k_u: 50, nv: 2, tile: 13, time: 12)\n",
       "Coordinates:\n",
       "  * tile       (tile) int64 0 1 2 3 4 5 6 7 8 9 10 11 12\n",
       "  * j          (j) int32 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89\n",
       "  * i          (i) int32 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89\n",
       "  * k          (k) int32 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49\n",
       "    Z          (k) float32 -5.0 -15.0 -25.0 -35.0 ... -5039.25 -5461.25 -5906.25\n",
       "    PHrefC     (k) float32 49.05 147.15 245.25 ... 49435.043 53574.863 57940.312\n",
       "    drF        (k) float32 10.0 10.0 10.0 10.0 10.0 ... 387.5 410.5 433.5 456.5\n",
       "    XC         (tile, j, i) float32 -111.60647 -111.303 ... -111.86579\n",
       "    YC         (tile, j, i) float32 -88.24259 -88.382515 ... -88.07871 -88.10267\n",
       "    rA         (tile, j, i) float32 362256450.0 363300960.0 ... 361119100.0\n",
       "    hFacC      (tile, k, j, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0\n",
       "    time_bnds  (time, nv) datetime64[ns] dask.array<chunksize=(12, 2), meta=np.ndarray>\n",
       "    iter       (time) int32 dask.array<chunksize=(12,), meta=np.ndarray>\n",
       "  * time       (time) datetime64[ns] 2010-01-16T12:00:00 ... 2010-12-16T12:00:00\n",
       "  * i_g        (i_g) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89\n",
       "  * j_g        (j_g) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89\n",
       "  * k_u        (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49\n",
       "  * k_l        (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49\n",
       "  * k_p1       (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 42 43 44 45 46 47 48 49 50\n",
       "    XG         (tile, j_g, i_g) float32 -115.0 -115.0 ... -102.928925 -108.95171\n",
       "    YG         (tile, j_g, i_g) float32 -88.17569 -88.31587 ... -88.02409\n",
       "    CS         (tile, j, i) float32 0.06157813 0.06675376 ... -0.9983638\n",
       "    SN         (tile, j, i) float32 -0.99810225 -0.9977695 ... -0.057182025\n",
       "    Zp1        (k_p1) float32 0.0 -10.0 -20.0 -30.0 ... -5244.5 -5678.0 -6134.5\n",
       "    Zu         (k_u) float32 -10.0 -20.0 -30.0 -40.0 ... -5244.5 -5678.0 -6134.5\n",
       "    Zl         (k_l) float32 0.0 -10.0 -20.0 -30.0 ... -4834.0 -5244.5 -5678.0\n",
       "    dxG        (tile, j_g, i) float32 15584.907 15589.316 ... 23142.107\n",
       "    dyG        (tile, j, i_g) float32 23210.262 23273.26 ... 15595.26 15583.685\n",
       "    Depth      (tile, j, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0\n",
       "    rAz        (tile, j_g, i_g) float32 179944260.0 180486990.0 ... 364150620.0\n",
       "    dxC        (tile, j, i_g) float32 15583.418 15588.104 ... 23406.256\n",
       "    dyC        (tile, j_g, i) float32 11563.718 11593.785 ... 15578.138\n",
       "    rAw        (tile, j, i_g) float32 361699460.0 362790240.0 ... 364760350.0\n",
       "    rAs        (tile, j_g, i) float32 179944260.0 180486990.0 ... 364150620.0\n",
       "    drC        (k_p1) float32 5.0 10.0 10.0 10.0 ... 399.0 422.0 445.0 228.25\n",
       "    PHrefF     (k_p1) float32 0.0 98.1 196.2 ... 51448.547 55701.18 60179.445\n",
       "    hFacW      (k, tile, j, i_g) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0\n",
       "    hFacS      (k, tile, j_g, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0\n",
       "    maskC      (k, tile, j, i) bool False False False ... False False False\n",
       "    maskW      (k, tile, j, i_g) bool False False False ... False False False\n",
       "    maskS      (k, tile, j_g, i) bool False False False ... False False False\n",
       "    maskCtrlW  (k, tile, j, i_g) bool False False False ... False False False\n",
       "    maskCtrlS  (k, tile, j_g, i) bool False False False ... False False False\n",
       "    maskCtrlC  (k, tile, j, i) bool False False False ... False False False\n",
       "Dimensions without coordinates: nv\n",
       "Data variables:\n",
       "    THETA      (time, tile, k, j, i) float32 dask.array<chunksize=(12, 13, 50, 90, 90), meta=np.ndarray>\n",
       "    SSH        (time, tile, j, i) float32 dask.array<chunksize=(12, 13, 90, 90), meta=np.ndarray>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "custom_dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Verifying our new NetCDF files\n",
    "\n",
    "To verify that ``to_netcdf()`` worked, load them and compare with the originals.\n",
    "\n",
    "### Compare *theta_dataset* with *dataset_1*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the first test dataset\n",
    "dataset_1 = xr.open_dataset(new_filename_1)\n",
    "\n",
    "# release the file handle (not necessary but generally a good idea)\n",
    "dataset_1.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `np.allclose` method does element-by-element comparison of variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "checking THETA \n",
      "-- identical in dataset_1 and theta_dataset : True\n"
     ]
    }
   ],
   "source": [
    "# loop through the data variables in dataset_1\n",
    "for key in dataset_1.keys():\n",
    "    print ('checking %s ' % key)\n",
    "    print ('-- identical in dataset_1 and theta_dataset : %s' % \\\n",
    "           np.allclose(dataset_1[key], theta_dataset[key], equal_nan=True))\n",
    "    \n",
    "# note: ``equal_nan`` means nan==nan (default nan != nan)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*THETA* is the same in both datasets.\n",
    "\n",
    "### Compare *custom_dataset* with *dataset_2*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "finished loading\n"
     ]
    }
   ],
   "source": [
    "# our custom dataset\n",
    "dataset_2 = xr.open_dataset(new_filename_2)\n",
    "dataset_2.close()\n",
    "print ('finished loading')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "checking THETA \n",
      "-- identical in dataset_2 and custom_dataset : True\n",
      "checking SSH \n",
      "-- identical in dataset_2 and custom_dataset : True\n"
     ]
    }
   ],
   "source": [
    "for key in dataset_2.keys():\n",
    "    print ('checking %s ' % key)\n",
    "    print ('-- identical in dataset_2 and custom_dataset : %s'\\\n",
    "           % np.allclose(dataset_2[key], custom_dataset[key], equal_nan=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*THETA* and *SSH* are the same in both datasets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So nice to hear!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Saving the results of calculations\n",
    "\n",
    "### Calculations in the form of `DataArrays`\n",
    "Often we would like to store the results of our calculations to disk.  If your operations are made at the level of `DataArray` objects (and not the lower `ndarray` level) then you can use these same methods to save your results.  All of the coordinates will be preserved (although attributes be lost).  Let's demonstrate by making a dummy calculation on SSH\n",
    "\n",
    "$$SSH_{sq}(i) = SSH(i)^2$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'SSH' (time: 12, tile: 13, j: 90, i: 90)>\n",
       "dask.array<mul, shape=(12, 13, 90, 90), dtype=float32, chunksize=(12, 13, 90, 90), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * tile     (tile) int64 0 1 2 3 4 5 6 7 8 9 10 11 12\n",
       "  * j        (j) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89\n",
       "  * i        (i) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89\n",
       "    XC       (tile, j, i) float32 -111.60647 -111.303 ... -105.58465 -111.86579\n",
       "    YC       (tile, j, i) float32 -88.24259 -88.382515 ... -88.07871 -88.10267\n",
       "    rA       (tile, j, i) float32 362256450.0 363300960.0 ... 361119100.0\n",
       "    iter     (time) int32 158532 159204 159948 160668 ... 165084 165804 166548\n",
       "  * time     (time) datetime64[ns] 2010-01-16T12:00:00 ... 2010-12-16T12:00:00\n",
       "    CS       (tile, j, i) float32 0.06157813 0.06675376 ... -0.9983638\n",
       "    SN       (tile, j, i) float32 -0.99810225 -0.9977695 ... -0.057182025\n",
       "    Depth    (tile, j, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SSH_sq = custom_dataset.SSH * custom_dataset.SSH\n",
    "\n",
    "SSH_sq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*SSH_sq* is itself a `DataArray`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before saving, let's give our new *SSH_sq* variable a better name and descriptive attributes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'SSH^2' (time: 12, tile: 13, j: 90, i: 90)>\n",
       "dask.array<mul, shape=(12, 13, 90, 90), dtype=float32, chunksize=(12, 13, 90, 90), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * tile     (tile) int64 0 1 2 3 4 5 6 7 8 9 10 11 12\n",
       "  * j        (j) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89\n",
       "  * i        (i) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89\n",
       "    XC       (tile, j, i) float32 -111.60647 -111.303 ... -105.58465 -111.86579\n",
       "    YC       (tile, j, i) float32 -88.24259 -88.382515 ... -88.07871 -88.10267\n",
       "    rA       (tile, j, i) float32 362256450.0 363300960.0 ... 361119100.0\n",
       "    iter     (time) int32 158532 159204 159948 160668 ... 165084 165804 166548\n",
       "  * time     (time) datetime64[ns] 2010-01-16T12:00:00 ... 2010-12-16T12:00:00\n",
       "    CS       (tile, j, i) float32 0.06157813 0.06675376 ... -0.9983638\n",
       "    SN       (tile, j, i) float32 -0.99810225 -0.9977695 ... -0.057182025\n",
       "    Depth    (tile, j, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0\n",
       "Attributes:\n",
       "    long_name:  Square of Surface Height Anomaly\n",
       "    units:      m^2"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SSH_sq.name = 'SSH^2'\n",
    "SSH_sq.attrs['long_name'] = 'Square of Surface Height Anomaly'\n",
    "SSH_sq.attrs['units'] = 'm^2'\n",
    "\n",
    "# Let's see the result\n",
    "SSH_sq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "much better!  Now we'll save."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "saving to  ./ssh_sq_DataArray.nc\n",
      "finished saving\n"
     ]
    }
   ],
   "source": [
    "new_filename_3 = './ssh_sq_DataArray.nc'\n",
    "print ('saving to ', new_filename_3)\n",
    "\n",
    "SSH_sq.to_netcdf(path=new_filename_3)\n",
    "print ('finished saving')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Calculations in the form of `numpy ndarrays`\n",
    "\n",
    "If calculations are made at the `ndarray` level then the results will also be `ndarrays`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numpy.ndarray"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SSH_dummy_ndarray = custom_dataset.SSH.values *  custom_dataset.SSH.values\n",
    "\n",
    "type(SSH_dummy_ndarray)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You'll need to use different methods to save these results to NetCDF files, one of which is described here: http://pyhogs.github.io/intro_netcdf4.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "Saving `Datasets` and `DataArrays` to disk as NetCDF files is fun and easy with ``xarray``!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
